Back

Bachelor Thesis Project

Evaluating the effectiveness of Twitter Sentiment Analysis as a predictive tool for the stock market.

Aim

The goal of this thesis was to build a time series representing the sentiment polarity of tweets related to a selected group of companies, and compare it to the corresponding time series of their stock market behavior.

The selected companies were: Apple, Google, Nike, Nestlé, Beyond Meat, Bayer, and NovaVax.

Data Acquisition

I collected tweets using Twitter's API, writing all the code in R. I automated the data collection process using Windows Task Scheduler, ensuring that tweets were downloaded daily at the same time.

The downloaded data was automatically uploaded to OneDrive for remote access. Additionally, I implemented an automated email notification system via Gmail that confirmed the successful execution of the download and included basic statistics about the collected data.

Sentiment Analysis

I preprocessed the data by cleaning and lemmatizing the tweets, and then applied three sentiment analysis methods to compute a polarity score for each tweet:

Naive Bayes
Based on Bayes’ Theorem, this classifier labeled tweets as "positive" or "negative" using the MPQA Subjectivity Lexicon by Janyce Wiebe.
Syuzhet
Used the Syuzhet R package and its associated dictionary to assign sentiment scores.
Udpipe
Applied the Udpipe R package with the MPQA lexicon. This method supports intensifiers, weakeners, and modifiers, enabling it to differentiate between phrases like "good", "very good", "quite good", and "not good".

Conclusion

To evaluate whether a causal relationship exists between tweet sentiment and a company’s stock price, I developed a custom test based on the Granger Causality Test, which I named the Close Test. The results were promising, revealing several statistically significant causal relationships.

Interestingly, the test found that:

Tweets specifically referencing a company’s stock symbol (e.g., $AAPL, $GOOGL), referred to as the "stock" dataset, were more useful for short-term predictions (same-day closing price).
Tweets mentioning the company name more generally (e.g., Apple, Google), referred to as the "score" dataset, were better suited for forecasting stock behavior over the following days.

A more detailed analysis of the visualizations and findings can be accessed in the final report or by visiting the corresponding GitHub repository.

Davide Giardini

Data Scientist & AI Developer

I focus on LLMs, Deep Learning
and Knowledge Graphs.

Bachelor Thesis Project

Evaluating the effectiveness of Twitter Sentiment Analysis as a predictive tool for the stock market.

Aim

Data Acquisition

Sentiment Analysis

Conclusion

Tags